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Abstract Emotion analysis and sentiment prediction deal with identifying and classification different 
emotions and sentiments based on the source data. Microblogging sites such as Twitter generate 
a massive amount of data incorporating other feelings and opinions in the context of tweets, 
posts, and status updates. However, it is challenging to identify and classify the emotions from 
the tweets because of restricted word length and data diversity. It requires an efficient classification 
and prediction technique for analyzing different emotions and sentiments from the Twitter data. 
This research proposes Enhanced Deep Neural Network (EDNN) based hierarchical Bi-LSTM 
model for emotion analysis and sentiment prediction. The data for analysis was obtained from 
the Airline Twitter Sentiment dataset that is further processed for extracting relevant features for 
emotion analysis and sentiment prediction. The proposed hierarchical Bi-LSTM model was used to 
classify five emotions: sadness, love, joy, fear, and anger, along with three sentiment forms: positive, 
negative, and neutral conditions. Compared to the traditional hybrid CNN-LST™M approach, the 
emotion analysis and sentiment prediction results indicate that the proposed method has superior 
accuracy, recall, precision, and F-Score. 





Keywords emotion analysis - sentiment prediction - machine learning - deep learning - hierarchical 
Bi-LSTM 


1 Introduction 


In recent times, the evolution of social media platforms has increased textual data generation. 
Social media platforms help in creating virtual bonds between multiple users by allowing them to 
express their opinions and share their ideas with other users instantly (Oztiirk & Ayvaz, 2018) 
[17]. Twitter is one such popular social media platform that is growing at a faster pace. Twitter 
is a micro-blogging site where the users can create a post using short texts called tweets. ‘This 
platform has resulted in the significant transformation of conventional text sharing platforms. The 
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tweets shared by the users have used to collect important and valuable information that they can 
share with multiple users (Valenzuela et al., 2017) [29]. The empirical analysis of the semantic 
information obtained from the Twitter data is called Sentiment Analysis(SA) (Diamantini et al., 
2019) [6]. SA is typically a method used to identify and categorize the emotions and opinions of 
a given text depending on the text’s polarity (Coletta et al., 2014) [5]. The primary goal of the 
SA is to determine whether the reader has positive or negative sentiment based on the standard 
categorization (Tellez et al., 2017) [26]. The process of sentiment analysis presented in figure 1. 
In general, sentiment analysis is categorized under classification tasks, which are performed using 
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Fig. 1 Process Flow of Sentiment Analysis 


variously supervised and unsupervised machine learning techniques (Gautam & Yadav, 2014) [9]. 
These techniques achieve superior efficiency for segmentation, classification of opinions, emotions, 
and sentiments from the textual data (Rohit & Jagdale, 2018) [21]. Analysis of emotions during 
critical circumstances can be challenging since these situations possess high uncertainty and mixed 
feelings. Existing research works on the classification of different sentiments from short texts employ 
two techniques mainly: opinion lexicon (Eskander & Rambow, 2015) [8] and natural language 
processing (NLP) (Chong et al., 2014) [4]. The majority of the semantic analysis approaches use 
sentiment lexicon for recognizing emotional keywords for classification. However, it is not practical 
to analyze the sentiment based on few emotional keywords since these keywords do not represent 
the sentiment of an entire sentence. Besides, there are chances that some of the important factors, 
such as logical relation between words, might neglect. Hence, in most of the existing approaches, 
the semantic relations and the cognitive factors used for producing the sentiments are given minor 
prominence. The evolution of Machine Learning (ML) and Deep Learning (DL) techniques offer 
potential solutions to these problems and improve the accuracy of sentiment classification (Tripathy 
et al., 2015) [27](Untawale & Choudhari, 2019) [28]. Considering this, the proposed research aims 
to effectively analyze the emotions of the users based on textual data from social media sites and to 
predict sentiments using machine learning algorithms. This research uses an ML-based hybrid CNN- 
LSTM as an existing approach for comparative analysis and proposes EDNN based hierarchical 
Bi-LSTM model for emotion analysis and sentiment prediction. The remaining of the article is 
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organized as follows: Section 2 gives a concise review of related research works done on emotion 
analysis and sentiment prediction using various ML approaches. Section 3 discusses the research 
methodology. Section 4 provides a detailed explanation about the proposed Hierarchical Bi-LSTM 
Model used for emotion analysis and sentiment prediction from textual data. Section 5 discusses 
the results of the proposed approach and the comparison with the existing hybrid Convolutional 
Neural Network(CNN). Section 6 briefs the application of emotion and sentiment classification, and 
section 7 concludes the paper by providing a summary of the research and prominent observations 
from the results. 


1.1 Problem Statement 


Sentiment analysis and emotion recognition are significant tasks employed for identifying and 
analyzing different emotions and sentiments based on a given input. Emotions are used for analyzing 
the opinion of an individual. It is challenging to accurately identify the emotions since they possess 
multi-dimensional characteristics different for different people. In general, emotions are expressed 
either by expression or through speech. Recently, due to the surging acclamation of social media 
platforms such as Twitter, textual data is being used widely for emotion and sentiment analysis. 
Since most people use these platforms for sharing their feelings and opinions, it is easier to analyze 
the sentiment of the people by evaluating their texts. The texts are also composed of various 
emoticons and images that depict the emotions. It can infer that there are three main types 
of positive, negative, and neutral sentiments by analyzing these texts. Various researchers have 
proposed several techniques for accurate sentiment analysis and production. 

However, emotion analysis and sentiment prediction from Twitter data has several challenges: 








— The data obtained in the form of tweets are generally in short text lengths with incomplete 
information. makes it difficult for the classifier to identify different emotions and sentiments 
based on the restricted word length. 

— The twitter data is rich in data diversity, i.e., it incorporates various emotions and sentiments 
hidden in a single tweet. Besides, the tweets have several implicit complex features that are 
quite difficult to analyze. 

— Sentiment prediction involves many complexities since the information extracted from the 
online users possesses a certain amount of irrelevant and false data that significantly affects 
the accuracy of the prediction models. 

— Most of the approaches used for emotion analysis and sentiment prediction work effectively for 
a limited number of emotions resulting in inaccurate predictions. Unwanted features and false 
data deteriorate the efficiency of the classification models. 


1.2 Motivation 


Emotion analysis and sentiment prediction from the textual data has gained vast significance in 
the recent time since the prominent attributes procured from emotion and sentiment analysis 
can be employed in a variety of applications, like decision making, psychological processes, and 
opinion collection for political promotion and product marketing, etc. Due to the enormous textual 
data generation, social media platforms have become prominent sources for emotion analysis and 
sentiment prediction. Primarily, microblogging sites such as Twitter are used widely for collecting 
people’s opinions and views that are available in the form of tweets. The anonymity of Twitter 
makes it easy for people to express their original emotions and sentiments on the microblogging 
site. Various approaches use the data collected from the tweets for analyzing the different emotions 
and sentiments of the people. Many pre-existing methods for emotion analysis are based on emotion 
lexicons and need to restrict themselves for external resources or manual pre-processing for complex 
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feature analysis. Based on the research challenges of the existing approaches discussed in the above 
section, this research intends to propose a technique that can collect and extract different emotional 
features from Twitter and accurately predict the sentiments based on the extracted features. 


1.3 Contribution 


The primary objective of this proposed work is to develop an effective emotion classification and 
sentiment prediction model based on textual data to analyze different emotions and sentiments. 
The proposed approach comprises a hierarchical Bi-LSTM model for extracting both word sequence 
and word semantic features for emotion analysis and sentiment prediction. 

The main contribution of the proposed research is summarized in the below points: 


— This research proposes a hierarchical Bi-LSTM approach for analyzing different emotions and 
predicting the polarity of the sentiments from textual data. The hierarchical Bi-LSTM combines 
two or more Bi-LSTMs and can learn complex contextual features and semantic information 
from the Twitter data. 

— The experimental analysis was based on the real dataset collected from the Airline Twitter 
Sentiment dataset and TRAIN dataset for classifying five dimensions of emotions and three 
different sentiments. 

— The findings obtained are compared with pre-existing CNN-LSTM model to verify the efficacy 
of the method proposed. 





2 Related Work 


Emotion analysis and sentiment prediction from textual data have gained vast significance in recent 
years. Emotion analysis and sentiment prediction are the functionality of identifying an individual’s 
emotions from the text, speech, or action using suitable approaches. The main objective of these 
approaches is to develop an intelligent model capable of perceiving, recognizing, and understanding 
human emotions. Various researchers have proposed different techniques that use classifiers for 
analyzing sentiment from textual data. This section discusses some of the major works done in this 
context. 


(Wu et al., 2020) [31] Presented an ML-based sentiment prediction model for predicting the 
sentiment using textual information from social media sites. This research developed a DL-based 
hybrid method that combines Bi-LSTM and CNN for identifying different labels of emotions 
obtained from psychiatric social texts. The proposed method employs a word representation technique 
for representing semantic relationships between two words. It was hypothesized from the outcome of 
the experimental analysis that the suggested method exhibits superior performance as compared to 
other existing models using different feature extraction techniques such as Independent Component 


Analysis(ICA) [13], Latent Semantic Analysis(LSA) [12], and Bag-Of-Words(BOW) [32]. 





(Aishwarya et al., 2019) [1] Presented a novel approach for sentiment analysis using supervised 
ML algorithms such as Adaboost and MLP (neural network) algorithms. The research aimed 
to analyze the sentiment using two stages as data extraction and data classification. The data 
for the analysis was collected by gathering different people’s opinions about a particular concept 
based on social media data. The proposed approach analyses the sentiments based on the given 
texts’ polarity based on the hash-tags (both positive and negative) for specified content. It further 
classified these texts into various other small classes. It established a balance between the sentiment 
factors using new mechanisms such as repetition of letters in a word and uppercase letters. The 
proposed model’s performance was validated using the model to analyze people’s sentiment in 
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the US during presidential elections. Results showed the potential performance of the presented 
technique compared to existing models. 


(Chatterjee et al., 2019) [3] Proposed a novel DL-based model for detecting different emotions from 
the textual dialogues such as happy, angry, and sad. The primary objective of this research was to 
combine both sentiment and semantic-based representations for obtaining accurate and precise 
emotion detection. The study combined semi-automated methods for collecting large amounts 
of training data with various methods for defining emotions to train the proposed model. The 
proposed model was evaluated based on real-time dialogue-based datasets. The potential of the 
proposed approach was evaluated by examining it in practical scenarios. It was observed from the 
results that this DL-based model showed remarkable performance compared to conventional ML 
algorithms and other deep learning models. 








(Poria et al., 2018) [18] Presented a multimodal sentiment analysis and discussed the prominent 
challenges associated with it. This work evaluated the application of three DL-based models for 
classifying multimodal sentiments that each approach improvised over others. The approaches were 
evaluated using multiple datasets with fixed training or testing partitions. The study also addressed 
the key issues that are least focused in most approaches, such as the influence of exclusive speaker 
techniques, the significance of various modalities, and the impact of generalizability in deep learning 
models. The proposed approach illustrated various aspects of the sentiment analysis that needs to 
be considered for carrying out multimodal sentiment analysis. This makes the proposed approach 
set a new benchmark for further investigation in the area of multimodal sentiment analysis. 


(Akhtar et al., 2019) [2) Proposed a practical deep learning-based multi-task learning approach 
for performing emotion recognition and sentiment analysis. The study implemented a three bi- 
directional Gated Recurrent Unit (biGRU) network to extract the contextual data. The multimodal 
inputs such as speech, visual frames, and textual data obtained from the video datasets do not 
contribute to the decision-making process due to uncertainties in the multimodal data. A context- 
level inter-modal attention approach proposed to overcome this in the research that simultaneously 
predicts the sentiment and identifies the emotions precisely. Evaluated the potency of the biGRU by 
testing it on the CMU-MOSEI dataset. Results demonstrated that the developed biGRU technique 
showed significant enhancement compared to the single-task approach. Additionally, An advanced 
state-of-the-art output for emotion analysis and sentiment prediction is also given by the suggested 
method. 





(Poria et al., 2019) [19] Proposed an analysis for emotion recognition that includes analysis of 
main research challenges, various techniques, and recent technological advancements related to 
emotion recognition. This research work provides a thorough review of recent progress and how 
these advancements are used to overcome the limitations of conventional emotion recognition 
models. The study summarized the effectiveness of the emotion-shift recognition model and context 
encoder capable of yielding superior efficiency and enhancing the functionalities of the task-based 
dialogues. Lastly, the study stated that fine-grained speaker-specific continuous emotion recognition 
could emerge as one of the main approaches for understanding emotions during long monologues. 


3 Research Methodology 


This research aims to analyze the emotions from the text messages from social media sites for better 
sentiment predictions through which one can understand the emotion of the user. This research uses 
an enhanced DL-based hierarchical Bi-LSTM model for emotion analysis and sentiment prediction. 
The study also uses a hybrid convolutional neural network (CNN), based models, as an existing 
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approach. The proposed hierarchical Bi-LSTM model’s performance will be compared with CNN 
to validate its effectiveness. The Workflow of the Proposed Approach is illustrated in figure 2. 
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Fig. 2 Workflow of the Proposed Hierarchical Bi-LSTM Approach 


3.1 Data Extraction 


Extraction of data is the initial stage that involves the selection of input data in the form of text 
from the Twitter database. The dataset (tweet corpus) is appropriate for emotion analysis and 
sentiment prediction that was extracted from the Airline Twitter Sentiment dataset. 


— Dataset for Emotion Analysis The TRAIN dataset was used with a total of 15428 rows with 
10028 data rows for training and 5400 data rows for 5400 with five labels. 

— Dataset for Sentiment Prediction The dataset with a total row of 14640 was considered 
for the analysis with two total columns. An overall 9516 data row was considered for training 
analysis and 5124 rows for testing data with three labels. 


3.2 Data Preprocessing 


The data is obtained from a microblogging site (Twitter). Data pre-processing is the basic step 
of feature extraction, emotion classification, and sentiment prediction. The main operation of the 
pre-processing stage is to remove unwanted noise, inconsistency, and redundancy from the textual 
data. Generally, the obtained data will be in raw form and contains noise, incompleteness, and 
inconsistencies. Hence it is essential to perform data pre-processing to process the data and make 
it suitable for further processing. Various activities such as stemming, lemmatization, stop words 
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removal, etc., are performed in this stage. Pre-processing of tweets is performed using the following 
steps: 


Removal of all URLs, hashtags (#¢content) and targets (Qusername). 
Removal of unnecessary numbers, symbols, and punctuations. 

Common stop words are eliminated. 

— Converting all the tweets into the lower case for making the dataset uniform. 
— Removal of all redundant characters and words from the tweets. 


3.3 Feature Extraction and Selection 


During this phase, all the necessary features are extracted, which are useful for formulating the 
main characteristics of the text. This is advantageous in identifying valuable words from the text 
which express an opinion or a sentiment. In this stage, new feature subsets are obtained to select 
relevant and appropriate features from the text. The study uses a Bi-LSTM to extract relevant 
features from the textual data and to select appropriate features for recognizing the emotion and 
for analyzing the sentiments based on the texts. 





3.4 Emotion Recognition and Sentiment Prediction 


This study proposes an EDNN-based Hierarchical Bi-LSTM model for emotion recognition. In the 
training phase, the model will learn to map the input text to the corresponding class of emotion 
based on the data set samples. The feature vectors and related class labels will be supplied to the 
deep learning model. In the prediction process, the feature extractor is employed to obtain feature 
vectors from a new and unseen text. The model will use these feature vectors to generate prediction 
results. 








4 Hierarchical Bi-LSTM for Emotion Analysis and Sentiment Prediction 


The process flow of the proposed approach for emotion analysis and sentiment prediction is illustrated 
in figure 3. The architecture consists of two modules namely the data preparation module and the 
emotion and sentiment extraction module. Initially, the obtained dataset is made suitable for testing 
by processing them. The obtained dataset is classified into subjective and neutral data. Subjective 
textual information reflects positive and/or negative sentiment, while neutral textual information 
does not incorporate any sentiment. Preparation of data is performed in the data preparation 
module of the proposed approach whereas the emotion extraction module in the model will extract 
the emotion from the processed data. 





— Data preparation module: In this module, the obtained data is processed before feeding it to 
the classifier. On Twitter, users express their opinions and emotions in the form of short texts. 
The collected tweets are subjected to cleaning wherein the tweets containing links to other 
multimedia systems and re-tweets are eliminated. Further, the tweets are labeled manually 
based on emotions. In this research, the tweets are labeled based on five different emotions 
such as sadness, love, joy, fear, and anger. Tweets containing no emotions are labeled as neutral 
tweets. As discussed in the previous section, pre-processing is performed to eliminate irrelevant 
parts from the collected tweets. Pre-processing is important since it influences the accuracy of 
the classifier. Pre-processing of tweets is performed based on Tokenization, wherein the text is 
segmented into a sequence of words, phrases, symbols, and other relevant units called tokens. 
Tokenization also eliminates certain symbols such as punctuations. 
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Fig. 3 Working Diagram of Emotion Analysis and Sentiment Prediction 


— Emotion and Sentiment Extraction module: This is an important module wherein the 
emotion from the text is extracted and is classified based on the five categories of emotion. 
Correspondingly, based on the extracted features, the sentiment is classified into three labels: 
negative, positive, and neutral. In this approach, the tweets are represented via feature vectors 
on the basis that the classifier predicts the emotion. The emotion and sentiment extraction 
module are constructed using Bi-LSTM. The LSTM technique in hierarchical Bi-LSTM enhances 
the caliber of learning for long-time sequential data and is more appropriate for evaluating 
the time-dependent word sequence. The detailed architecture of the hierarchical Bi-LSTM for 
emotion analysis and sentiment prediction is discussed in the below sections. 








4.1 Bidirectional LSTM 


The architecture of Hierarchical Bi-LSTM consists of two or more Bi-LSTMs connected. In the 
proposed approach, two Bi-LSTMs are connected in forward and backward directions to constitute 
Hierarchical Bi-LSTM. This arrangement strengthens the efficiency of the system architecture for 
emotion and sentiment analysis where different emotions and sentiments are classified accurately. 
Figure 4 depicts the fundamental structure of bidirectional LSTM.[24] The Bi-LSTM replicates the 
primary recurrent layer in the architecture in such a way that two layers are formed side-by-side. 
This provides the input sequence to the first layer as it is, and provides the second layer with 
a reversed copy of the input sequence. In each iteration, it records the prior words stored in its 
memory unit and evaluates the probability of the next word (Li & Qian, 2016) [14]. For each word 
stored in the library, the Bi-LSTM allocates a probability-based upon past words and identifies 
the word that holds the higher probability, and in its memory, that word is stored. The exemplary 
memory of Bi-LSTM makes it suitable for language generation as it remembers the background 
of the conversation at any moment. The limitation of the traditional Recurrent Neural Networks 
(RNN) approach to storing long-length word sequences is overcome by Bi-LSTM. A four-layer 
neural network is a Bi-LSTM model and each LSTM’s memory unit consists of three gates, such 
as the input, output, and forget gate. These gates allow the model to either retain or forget words 
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Fig. 4 Fundamental Structure of Bidirectional LSTM 


at any moment by regulating the flow of data through that gate. This enables the Bi-LSTM model 
to track only relevant data. This reduces the issue of data disappearing gradient that helps the 
system to remember data stored for a longer time. The cell C; runs through the network and the 
LSTM gates such as input, gate, and output gate control the flow of data through Bi-LSTM via 
the sigmoid function{10]. When the value of the sigmoid activation function (Su et al., 2018) [25] 
is ‘1’, then the data is completely passed through the gates. Whereas if the value of the sigmoid 
function is ‘0’, then the data is not allowed by the gate. 
The amount of data that has to be passed through is decided by the forget gate, as defined in 
equation 1.[25, 30] 

fe = o(Wy.[he-1, xt] + bf) (1) 
Here o is the sigmoid activation function, Wy is the input word sequence, ht—1 is the previous state 
of the forget gate, and by represents the bottleneck features. Once the data is controlled by the 
forget gate, the input gate controls the new data that will be retained in cell state Ct. The formula 
for input gate is: 

1 = o(W;.[ht-1, r+] + bi) (2) 
The current state of the cell and the memory unit status of the cell is obtained from equation 3 


and 4 respectively. 
Ct — tanh(We.[ht_1, r+] -+ be) (3) 


Cr = fe-Ce_it it.Cy (4) 


Lastly, the output gate in the Bi-LSTM will be used for controlling the output of the sigmoid 
function as shown in equation 5 and hidden layer output is defined according to equation 6. 


Ot = o(Wo.|ht—1, £] -+ bo) (5) 
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ht = Of. tanh(C;) (6) 


The terms b and W represent the offset terms and weight coefficient matrix respectively, tanh is 
defined as the hyperbolic tangent activation function. 
The Bi-LSTM consists of an additional layer of reverse LSTM (Backward LSTM) that reverses the 
flow of information. The forward and backward information is synthesized by the hidden layer. By 
doing so, it ensures that every cell in the LSTM can obtain the context information. The reverse 
or the backward layer of the Bi-LSTM is evaluated in the same way as that of the forward LSTM. 
However, the direction of the information flow is reversed to acquire the subsequent information at a 
particular time. The forward and backward information flow in the Bi-LSTM network is illustrated 
as:|7| 

he = f(Wfixt + Weo-he-1) (7) 


hy = f(Woixe + Woo-ht41) (8) 


Where, hr and hp are the outputs of forward and backward LSTM respectively. And the final 
output of the hidden layer is given as: 


yi = g(Wo1 * hg + Wo2-hp) (9) 


4.2 Structure of Hierarchical Bi-LSTM for Emotion Analysis 


In the proposed approach, the Hierarchical BiLSTM will be used for classifying the emotions and 
for predicting the sentiments from the obtained textual data. The architecture of the proposed 
hierarchical Bi-LSTM model is illustrated in figure 5. The four layers of the hierarchical Bi- 
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Fig. 5 Architecture of the Hierarchical BiLSTM for Emotion Classification 

LSTM obtain detailed contextual information from both past and future scenarios. Compared 


to conventional Bi-LSTM, hierarchical Bi-LSTM consists of a greater number of upper layers for 
extracting relevant features. As shown in figure 5, the input sequence {U1, U2,...,U7} enters the 
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network through hidden layers of the input in the forward direction i.e.{a1, a2,...,a7} for acquiring 
detailed contextual information from all the past time sequences and further it passes through 
hidden layers in the backward direction {c1, c2,...,cr} for acquiring detailed contextual information 
from all the future time sequences. Step at each point of times, the upper hidden layers of the 
hierarchical BiLSTM extract the outputs from the lower hidden layers as their inputs for extracting 
the remaining features. Especially the upper layers of the forward hidden layers {b1,b2,..., br} 
and backward hidden layers {d1,d2,...,dr}. Finally, the output layer of the hierarchical Bi-LSTM 
integrates the hidden vector of the two upper layers to generate a combined output. As shown in 
the above figure 5, each LST'M cell of the hidden layer is represented in the form of a node that has 
a new input gate (i+), forget gate (f+), memory unit (ut), and output gate (o;).The new memory 
defines the value of the new input and now the input gate stores the new data in its cell. Forget 
gate represents the data that is dumped from the cell state and the output gate generates the 
required output. The control parameter C; controls the mechanism of data storing and dumping. 
The hidden states of each layer (az, bt,cz, and d+) of the hierarchical Bi-LSTM at each time step t 
is calculated as: The hidden state a; of the first forward layer is expressed in the equations below: 

















ie = o(¥\” zt + X”ni F Bs”) (10) 
fp? =o Vf + XO a + 6) (11) 
of = (VS z TN Gig BS) (12) 
ut = tanh(Y l” z + Xo ati + bta) (13) 
CO) = ou? + 7 oO, (14) 

) 


at = of” © tanh(C\™) (15 


The hidden state b: of the second forward layer is expressed in the equations below: 


i) = o(Y P ae + XO bi + 0) (16) 
FE? = oY P ae + XP bi + BY”) (17) 


of” = o(Vo” a + XS bea + By” 
uh = tanh(Y” a EA bekt be) 
OO ZU Cu JO OO,” 
bi = of”) © tanh(C®) 


The hidden state ct of the first backward layer is expressed in the equations below: 


it? =o (VO za + XO cry + OM) (22) 
fe? = o(¥ pa + XO ces +O) (23) 


of) = o (Ys? z F X leyi ae BS) 
ul? = tanh(Y.? Ap Ke Chet ar bn) 
clo) = ik? © ul 4 Ff? © Co, 


Ct = of) © tanh(C\”) 
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The hidden state d: of the second backward layer is expressed in the equations below: 


BO) = a (YM cg + XO deg +0) (28) 
19 =o (YP e + XO dig HP) (29) 
o” = of VOM ce + XP dipi +O) (30) 
ub = tanh(V ce + XP dipi + 0) (31) 
CSA ea ae OC (32) 
i= of” © tanh(c\) (33) 





The output Or is represented as the integrated hidden vector form of the second forward layer bz 
and the second backward layer d at each stage t, represented as the hidden vector form of the 
second forward layer b; and the second backward layer d [33]: 





On = YOR + XO di + 0? (34) 


In the proposed model, the hierarchical Bi-LSTM defines the contextual features from past and 
future time steps and integrates the features of both past and future time steps together as the 
output of the proposed hierarchical Bi-LSTM. 

The algorithm of hierarchical Bi-LSTM for emotion Analysis is illustrated below: 


Algorithm 1 Hierarchical BiLSTM Model for Emotion Analysis 


: Input : Labelled ‘Tweet Dataset 

Preprocessing: 

: Removal of redundant words 

Trimming the dataset 

Stemming the dataset 

Extraction of textual data from tweet: 

: Extraction of adjectives from extracted tweets: 

: Combine step 1 and 2: 

: Preprocessed file =file path name 

10: Stop word file list=path name of file 

11: Extracted tweets=file path of extracted tweets list 

12: Control the data flow through BiLSTM gates 

13: if 

14: the value of sigmoid function = 1 

15: the data is passed through the gates 

16: else 

17: no data flow through the gates 

18: Control the output of the sigmoid function 

19: Emotion Classification Obtain the textual feature Provide the textual feature as input to BiLSTM Obtain 
the textual feature representation 

20: Determine the output of Bi-LSTM emotion (O+) for the input sequence z;Classify the emotions 
(O+) = {Sadness, Love, Joy, Fear, Anger} 

21: end 
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4.3 Sentiment Analysis 


Every tweet contains three sentiments either positive sentiment, negative sentiments or neutral 
.The sentiment of the user can be determined using the sentiment score. That is calculated based 
on positive and negative sentiments as shown in below equation (Ruz et al., 2020) [23]: 


Sentiment Score = (p osttive — neg ative) 


oo 35 
(positive + negative + 2) 39) 
Where positive and negative defines the total count of positive and negative words in a tweet, 
respectively. The sentiment score is represented using a discrete 2-valued variable C that represents 
the sentiment class: 





Ce {-1,1} 


All the sentiment score values and the differences between them are captured by the variable C. In 
some of the cases, the polarity value fails to identify the degree of sentimentality from the textual 
data because in few cases, the negative and positive sentiment scores get canceled for each other 
that results in zero sentiment score i.e.,( Sentiment Score = 0). Though the textual data from the 
tweet is positive or negative and not neutral, the zero sentiment score results in false data. Hence 
the following constraints are followed to identify the positive and negative tweets. 





C = 1( positive tweet) if Sentiment Score > 0.1 


= —1( negative tweet) if Sentiment Score < 0.1 


When the tweets are given as input to the system, the polarity of those tweets is calculated to 
identify whether the given tweet is positive, negative, or neutral. Based on the results, the emotion 
of the user is recognized, and the sentiment is predicted. The algorithm of hierarchical Bi-LSTM 
for sentiment Analysis is illustrated below: 


Algorithm 2 Hierarchical BiLSTM Model for Sentiment Analysis 


: Input: Labelled Tweet Dataset 

: Preprocessing: 

: Removal of redundant words 

Trimming the dataset 

Stemming the dataset 

Extraction of textual data from tweet: 
Extraction of adjectives from extracted tweets: 
Combine step 1 and 2: 

: Preprocessed file =file path name 

10: Stop word file list=path name of file 

11: Extracted tweets=file path of extracted tweets list 

12: Get the Synonym words and Similarity words for extracted tweets 
13: For each word in adjective list 

14: Extract adjectives in the tweet data () 

15: For every positive word: a 

16: For every negative word: b 

17: Set the sentiment score value to 0.1 

18: Provide the textual feature as input to BiLSTM 

19: Obtain the textual feature representation 

20: Predict the sentiment based on the sentiment score 
21: Print: sentiment polarity of the tweet data. 

22: Positive: if the sentiment score is 1 

23: Negative: if the sentiment score is -1 

24: Neutral: if the sentiment score is 0 


re a ee te 
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5 Result and Discussion 


The performance of the proposed hierarchical Bi-LSTM was evaluated based on various performance 
metrics such as Fy, Score, Recall, Accuracy, and Support. The simulation results for emotion 
recognition and sentimental analysis are illustrated in the below sections and the effectiveness 
of the proposed approach was measured using the confusion matrix. 


5.0.1 Activation Functions 


The activation function is considered the building block of the neural network. In this research, a 
sigmoid function is used as an activation function for the proposed hierarchical Bi-LSTM. It can 
be observed that the activation function introduces an additional step at every network layer of 
the LSTM during the forward propagation. Due to that, the complexity of the network increases. 
However, without activation functions, it is challenging to design the neural network since they 
perform only linear transformations. This is one of the significant limitations since the network 
will not be able to learn complex patterns of the data. Hence the activation functions are used 
to perform nonlinear transformations thereby enhancing the capability of the network to solve 
complex data patterns (Mourgias-Alexandris et al., 2019) [16]. In this research sigmoid function is 
used as an activation function that is denoted as: 


1 
S = — 36 
(1 + e(-*)) (36) 
Sigmoid functions are real functions and are bounded and differentiable that is defined for all real 
input values and has a non-negative derivative at each point with only one inflection point. Sigmoid 
functions are used in this research to perform analysis of complex data for emotion and sentiment 
analysis. 


5.1 Confusion Matrix 


The confusion matrix is mainly used for solving the problems related to classification accuracy where 
the output can be two or more classes. Confusion matrix is a table with four different combinations 
of predicted and actual values as shown in figure 6: The terms TP represents True positive, FP 


wr rr rr rr rr rrr 7 
I i as an“ Sian Se Sn O. 


ee 


Positive(1) Negative(0) 













Positive(1) False Negative (FN) Actual Total 
True Positive (TP) or Positive (P) 
Type II Error =TP+ FN 










False Positive (FP) Actual Total 
Negative(0) or True Negative (TN) Negative (N) 
Type I Error =FP+TN 


Actual Values 





i acne 







Predicted Total 
Negative 
= FN+TN 





Predicted Total 
Positive 
=TP+FP 







Fig. 6 Confusion Matrix 
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is False Positive, FN is False Negative, and TN is True Negative. The confusion matrix in this 
research is used for measuring Recall, Precision, Specificity, Accuracy, and the ROC Curve[20]. 
The performance metrics are evaluated as: 


— Accuracy: 
((NoO f Correctly LabeledT weets) ) 
Accuracy = pa 
((TotalNumberO fT weetsInTheTestSet) ) 
Or 
TP+TN 
Accuracy = SeH (37) 


Where, P defines the number of positive instances and N defines the number of negative 
instances, TP defines the number of instances correctly labeled and that belong to a positive 
class, and FP represents the words inaccurately labeled and that belong to negative class. 

— Precision: Precision is defined as the ratio of the retrieved words that are relevant and is given 
as: 





EF 
Prede om aeee i 
recision (TP + FP) (38) 


— Recall: Recall for a function is determined as the ratio of the relevant words that are retrieved, 
and is given as: 





TP 
Recall = (TP+FN) = FN) (39) 


— F1 Score: F score is also defined as the F-measure that is determined as the weighted harmonic 
mean of its precision and recall[22]. Fı the score is used for measuring the accuracy of the system 
which can possess values between 1 and 0. Where 1 defines the best value and O defines the 
worst value. Correspondingly, the F score is represented as: 








(2 x Precision x Recall) 


DEA 40 
(Precision + Recall) (40) 


F; Score = 


The ROC (Receiver operating characteristics) curve in this research is used to visualize the 
performance of the multi-class classification problem during emotion and sentiment analysis. 
The obtained twitter data belongs to multiple classes and the dimensionality of the twitter data 
makes it challenging to classify with desired accuracy. ROC is a probability curve that defines 
the capability of the proposed classifier in distinguishing between multiple classes. The ROC 
curve is defined by the terms True positive rate (TPR) or recall, or sensitivity as defined in 
equation 39. Other terms that define the ROC curve are Specificity, and False positive rate that 
are given as: 

— Specificity: 





Specificity = TN/(TN + FP) (41) 


— False positive rate (FPR): 
FPR = 1 — Specificity 


Or 


FP 
FPR = ——— A? 
“= CPN + FP) (42) 
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Fig. 7 Loss Function of Hierarchical Bi-LSTM for Emotion Analysis 
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Fig. 8 Accuracy of Hierarchical Bi-LSTM for Emotion Analysis 


Table 1 Classification Data of Hierarchical Bi-LSTM for Emotion Analysis 


J Precision | Recall | FI Score | Support 
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5.2 Hierarchical Bi-LSTM for Emotion Analysis 


The loss function and accuracy of the hierarchical Bi-LSTM is given in figures 7 and 8, respectively. 
It can be observed from the above figures that the loss function significantly reduces after training 
the hierarchical Bi-LSTM model effectively. The significant improvement in the accuracy of the 
model can be observed from figure 8 where the trained Bi-LSTM model achieved an accuracy 
of approximately 0.88. The classification data for emotion analysis is illustrated in table 1. The 
confusion matrix of Hierarchical Bi-LSTM for the emotional analysis is illustrated in figure 9. The 
confusion matrix was constructed for five different emotions such as sadness, love, joy, fear, and 
anger, and the values for the true label were plotted against the predicted label. The Region of 
Curve (ROC) for the above confusion matrix is presented in figure 10. The ROC determines the 
true and false-positive rates. 
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Confusion matrix 
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Fig. 9 Confusion Matrix of Hierarchical Bi-LSTM for Emotional Analysis 
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Fig. 10 ROC of Hierarchical Bi-LSTM for Emotion Analysis 


5.3 Hierarchical BiLSTM for Sentiment Analysis 


The loss function and accuracy of the hierarchical Bi-LSTM are given in figures 11 and 12 respectively. 
It can be observed from the above figures that the loss function reaches a lower value after training 


Loss 





Fig. 11 Loss Function of Hierarchical BiLSTM for Sentiment Analysis 


the hierarchical Bi-LSTM model effectively. There is an increase in the accuracy of the model can 
be observed from figure 12 where the trained Bi-LSTM model achieved an accuracy of an average 
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Accuracy 





Fig. 12 Accuracy of Hierarchical Bi-LSTM for Sentiment Analysis 
0.75. The classification data for sentiment analysis is illustrated in table 2. The confusion matrix of 


Table 2 Classification Data of Hierarchical Bi-LSTM for Sentiment Analysis 
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Hierarchical Bi-LSTM for the sentimental analysis is illustrated in figure 13. The confusion matrix 


Confusion matrix 
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Fig. 13 Confusion Matrix of Hierarchical Bi-LSTM for Sentiment Analysis 


was constructed for three different sentiments such as positive, negative, and neutral. The values 
for the true label were plotted against the predicted label. The Region of Curve (ROC) for the 


above confusion matrix is presented in figure 14. The ROC determines the true and false positive 
rate. 
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Fig. 14 ROC of Hierarchical Bi-LSTM for Sentiment Analysis 


5.4 Hybrid CNN-LSTM for Emotion Analysis 


The loss function and accuracy of the Hybrid CNN-LSTM is given in figures 15 and 16 respectively. 
It can be observed from the above figures that the loss function reaches a lower value for the trained 


Loss 





Fig. 15 Loss Function of Hybrid CNN-LSTM for Emotion Analysis 
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Fig. 16 Accuracy of Hybrid CNN-LSTM for Emotion Analysis 


Hybrid CNN-LSTM data. The classification data for emotion analysis is illustrated in table 3. The 
confusion matrix of Hybrid CNN-LSTM for the emotion analysis is illustrated in figure 17 Similar 
to emotion analysis using hierarchical Bi-LSTM, The confusion matrix using Hybrid CNN-LSTM 
was constructed for five different emotions such as sadness, love, joy, fear, and anger and the values 
for the true label were plotted against the predicted label. The Region of Curve (ROC) for the 
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Table 3 Classification Data of Hybrid CNN-LSTM for Emotion Analysis 
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Fig. 17 Confusion Matrix of Hybrid CNN-LSTM for Emotion Analysis 


above confusion matrix is presented in figure 18. The ROC determines the true and false positive 
rate. 
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Fig. 18 ROC of Hybrid CNN-LSTM for Emotion Analysis 
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5.5 Hybrid CNN-LSTM for Sentiment Analysis 


The loss function and accuracy of the Hybrid CNN-LSTM is given in figures 19 and 20 respectively. 
It can be observed from the above figures that the loss function declines after training the CNN- 


Loss 
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Fig. 19 Loss Function of Hybrid CNN-LSTM for Sentiment Analysis 
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Fig. 20 Accuracy of Hybrid CNN-LSTM for Sentiment Analysis 
LSTM model. The classification data for sentiment analysis is illustrated in table 4. The confusion 


Table 4 Classification Data of Hybrid CNN-LSTM for Sentiment Analysis 


J Precision | Recall | FI Score | Support 
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matrix of Hybrid CNN-LSTM for the sentiment analysis is illustrated in figure 21. The confusion 
matrix was constructed for three different sentiments such as positive, negative, and neutral. The 
values for the true label were plotted against the predicted label. The Region of Curve (ROC) 
for the above confusion matrix is presented in figure 22. The ROC determines the true and false 
positive rate. 
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Fig. 21 Confusion Matrix of Hybrid CNN-LSTM for Sentiment Analysis 
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Fig. 22 ROC of Hybrid CNN-LSTM for Sentiment Analysis 


5.6 Performance Comparison 


The performance of the proposed Bi-LSTM is compared with the existing CNN-LSTM approach 
and the results are tabulated above. It can be observed from the results that the proposed Bi- 
LSTM achieves an average accuracy of 88.48 and 75.80 for emotion analysis and sentiment analysis 
respectively. Whereas the existing CNN-LSTM method achieved an overall accuracy of 83.0 and 
74.0 respectively. This shows that the proposed approach achieves desired performance compared 
to CNN-LSTM. 





6 Applications of Emotion and Sentiment classification 


Prominent Applications of Sentiment Classification are: 


— Sentiment analysis is mainly employed for classifying sentences, paragraphs, and documents 
into positive, negative, or neutral. 

— Sentiment analysis is used in commercial applications such as multimedia systems for writing 
movie reviews, news articles, restaurant reviews, and mobile customer reviews (Medhat et al., 
2014) [15]. 

— Complex sentiment classification is used to extract meaning from the sentence, classification of 
intent and in linguistics-based emotion analysis. 
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— Product manufacturing companies widely use sentiment analysis for obtaining accurate product 
reviews based on customer ratings. 

— Emotion Detection (ED), Building Resources (BR) and Transfer Learning (TL) is one of the 
prominent fields where sentiment prediction is prominently used. 

— Certain semi-supervised machine learning algorithms are used for creating artificial datasets 
using sentiment analysis. 


Prominent Applications of Emotion Classification are: 


— Emotion analysis and emotion classification is used in various psychological applications to 
determine the present state of an individual. 

— Software engineering applications such as usability testing and development process improvement 
require emotion analysis and recognition to understand the influence of the products on customers 
(Kołakowska et al., 2014) [11]. 

— Emotion analysis is widely used in modernized educational systems to understand the emotional 
states of the learners. 

— Another significant application of emotion classification is its use in the customization of 
upgraded sites where internet data is used to determine the profile of the user. More accurate 
personality models of users will be provided by adding information about the emotions of users. 

— Emotion analysis in video games helps to understand human emotions. Based on those certain 
entertaining aspects such as game-play, immersing storytelling, novelty, and graphics games will 
be added to enhance the game vision. 

— In social media websites, emotion analysis is often used to analyze the mood and interest of the 
users. 





7 Conclusion 


The goal of the proposed approach was to develop an efficient EDNN-based emotion analysis and 
sentiment prediction model that analyses users’ emotions using textual data from social media 
platforms. The data for experimental evaluation was collected from the microblogging site Twitter. 
This research proposed a hierarchical Bi-LSTM for emotion analysis and sentiment prediction. The 
performance of the hierarchical Bi-LSTM model was evaluated by simulating the collected data 
for different performance metrics such as F1 Score, Recall, Accuracy, and Support. The analysis 
was conducted for two conditions: emotion analysis and sentiment prediction. Emotion analysis is 
performed for five different emotions: sadness, love, joy, fear, and anger. In contrast, the sentiment 
analysis was conducted using three stages such as negative, positive, and neutral conditions. The 
performance of the proposed approach was compared with the existing hybrid CNN-LSTM method. 
Results validate the efficacy of the proposed technique by achieving superior accuracy for both 
emotion analysis and prediction of sentiment. 
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